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1. INTRODUCTION 

In social tagging systems, users collaboratively manage 
tags to annotate resources. Naturally, social tagging systems 
can be modeled as a tripartite hypergraph, where there are 
three different types of nodes, namely users, resources and 
tags, and each hyperedge has three end nodes, connecting a 
user, a resource and a tag that the user employs to annotate 
the resource. 

As for community detection in tripartite hypergraphs, a 
common strategy is to reduce the tripartite hypergraph to 
simpler unipartite graphs, bipartite graphs, or tripartite graphs, 
and then detect communities in the corresponding graphs 
[S1E03]- One major drawback of this class of methods is 
that some valuable information of the original hyperedges 
is lost during reduction [12| . and the subsequently detected 
communities tend to be less accurate. Researchers also pro- 
posed extended modularity optimization [7] and tensor de- 
composition [3] methods. But these two methods are bi- 
ased towards communities with one-to-one correspondence, 
as shown in Fig |l(a) Real- world social tagging systems are 
often more complex than that. For example, a group of users 
may be interested in a collection of resources about pro- 
gramming technology and another collection about sports. 
Hence, communities with many-to-many correspondence, as 
shown in Fig 1(b) are more significant. Besides, another 



disadvantage of some previous methods [UG2 is that they 
require one to specify certain parameters such as the num- 
bers of communities. In practice, such a priori knowledge is 
difficult to obtain. 

In this paper, we propose a quality function, based on 
the minimum description length (MDL) principle [10], for 
measuring the goodness of different partitions of a tripartite 
hypergraph into communities, and develop a community de- 
tection algorithm based on minimizing the quality function. 
Our method overcomes the limitations of previous methods 
and has the following key properties: 

• Independent: it handles broad families of tripartite hyper- 
graphs, and is competent for both communities with one- 
to-one correspondence and many-to-many correspondence. 
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Figure 1: Communities with (a) one-to-one corre- 
spondence, and (b) many-to-many correspondence. 



• Parameter-free: given the structure of a tripartite hyper- 
graph, it can automatically detect communities, without 
any prior knowledge like the numbers of communities. 

• Accurate: it is more accurate than previous methods. 

• Scalable: it is fast and scalable to large-scale hypergraphs. 

2. PROBLEM FORMULATION 

One fundamental issue is the definition of community in 
tripartite hypergraphs. Generally, a community should be 
a group of related nodes that correspond to a functional 
subunit in the real-world system. In unipartite graphs, a 
community is often understood as a group of nodes with 
dense connections between them. But this notion is not 
suitable for tripartite hypergraphs, since nodes of the same 
type are not connected. Instead, we consider a tripartite 
hypergraph community as a group of nodes that are struc- 
turally equivalent (in a weakened sense, and the same here- 
inafter) [9 |. This is a natural assumption, because a group 
of nodes that are similar with one another as regards their 
relations to nodes of other types are very likely to form a 
functional subunit. For example, in a social tagging system, 
those users having similar tagging actions are very likely to 
share the same interests; those resources that are annotated 
with similar tags are very likely to be in the same category. 
Meanwhile, the dense connections between certain commu- 
nities in different node sets constitute their correspondence. 

In the following, we formulates the problem of community 
detection in tripartite hypergraphs. Now assume an undi- 
rected and unweighted tripartite hypergraph H = (V r U 
V 9 UV b ,E), where V r , V 9 and V b are three disjoint node 
sets, and E C {« e V r ,v 9 e V 9 ,v b k € V b )} is the set of 



three-way hyperedges. For simplicity, nodes of the three dif- 
ferent types v\ € V r , v 9 £ V 9 and u£ £ V b are colored red, 
green and blue, respectively. Suppose n r = |V|,n 9 = |V 9 | 
and n b — \ V b \ are the numbers of red, green and blue nodes, 
and m = \E\ the number of hyperedges. The structure of 



Table 1: Notations for a tripartite hypergraph H 

Symbol Meaning 

V r The red node set 

V 9 The green node set 

V b The blue node set 

E The hypcredge set 

n r The number of red nodes 

n 9 The number of green nodes 

n b The number of blue nodes 

m The number of hypcredges 

The z-th red node 
v? The j-th. green node 

v k The k-th blue node 

c T The number of red communities 

c 9 The number of green communities 

c The number of blue communities 

V£ The a-th red community 

Vp The /3-th green community 

V b The 7-th blue community 

The number of nodes in 
rip The number of nodes in Vg 9 

n^, The number of nodes in V b 

ri The vector r whose i-th clement indicates 

the community membership of v\ 
gj The vector g whose j-th. clement indicates 

the community membership of v? 
6^ The vector b whose k-th clement indicates 

the community membership of v k 
Aijk The three-dimensional array A whose (i, j, k) 

element indicates the number of hypcredges 

between , v 9 and v k 
Mafi-y The three-dimensional array rvl whose (a, /3, 7) 

clement indicates the number of hypcredges 

between VI, Vg and V b 



H can be represented by a three-dimensional binary array 
A of n r x n 9 x n b size, with elements 

A ik = \ 1 X(vr,v?,v b k )£E; 
1 otherwise. 

The problem of community detection in H is that, sjiven 
A, how can we find a good partition ^ = {V^}Jj =1 © 

{V£}f =1 {Vf}° b = i that divides V , V 3 and V b into dis- 
joint communities, respectively: 
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v r 

V 9 



Note that the numbers of red, green and blue communities 
c r , c 9 and c b are not known a priori. The meaning of "good" 
is twofold: i) nodes in the same community are structurally 
equivalent; ii) hyperedges between communities are either 
dense or sparse, so that the correspondence between com- 
munities is clear. 

Throughout the paper, we use Latin letters i, j and k for 
indices of red, green and blue nodes, respectively, and use 
Greek letters a, (3 and 7 for indices of red, green and blue 
communities, respectively. Table [1] summarizes the nota- 
tions used in this paper. 

3. THE PROPOSED METHOD 

In this section, we first define a quality function for mea- 
suring the goodness of different partitions of a tripartite hy- 
pergraph into communities, and then propose an algorithm 
for minimizing the quality function. 



When we describe a graph as a set of communities, we 
are highlighting certain regularities (e.g., the similarities 
of nodes in the same community and the dissimilarities of 
nodes between different communities) while filtering out rel- 
atively unimportant details (e.g., the dissimilarities of nodes 
in the same community). Thus, description of a graph as 
communities can be viewed as a lossy compression of that 
graph's structure, and the community detection problem as 
a problem of finding an efficient compression of the struc- 
ture. This is the main insight of the structural information 
compression method proposed in [11], where the authors 
focus on information compression on a unipartite graph's 
structure. Here we show how to compress the structural in- 
formation of a tripartite hypergraph, in order to formulate 
our quality function. 

Now let us envision a communication process of transmit- 
ting structural information of a tripartite hypergraph H. 
A signaler knows the structure of H and aims to transmit 
much of the information in a reduced fashion to a receiver 
over a noiseless channel. To do so, the signaler makes a 
partition of H into communities and encodes the structural 
information X = {A} as compressed information summa- 
rizing the community structure: Y = {r,g,b,M}, where 
r, g and b are the community membership vectors of red, 
green and blue nodes, and M is the community connectivity 
array. For a partition dividing red, green and blue nodes 
into c r , c 9 and c b communities, we have r = [n, r-z, ■ ■ ■ , r n r ], 
g = [91,92, ■•■ ,9ns] and b = [61, b 2 , . . . , b n b], where n G 
{1,2,..., c r }, g G {1, 2, . . . , c 9 } and b k e {1, 2, . . . , c b } indi- 
cate the community memberships of nodes v\, w? and v\, 
respectively. The community connectivity array M is a 
three-dimensional array of c r x c 9 x c b size, with element 
Map^ € {0, 1, 2, . . . , m} indicating the number of hyper- 
edges between communities , VE and V b . 



That is 



m q/37 = A » k 



It is easy to derive that the description length (in bits) of 
the compressed information Y is 

L(Y) = n r logc r + n 9 logc 9 + 7i b logc b + c r c 9 c 6 log(m + 1) 

where the logarithm is taken in base 2. 

After receiving Y, the receiver knows the community mem- 
bership of each node and the number of hyperedges between 
each community triple. Then he tries to recover the origi- 
nal structural information X by constructing possible can- 
didates. The number of different candidates is given by 
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where n' n 
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a = l /3 = 1 7=1 

\vi\ 



and 



I "K, I are the numbers 



of nodes in communities , 
denote the binomial coefficient, and each binomial coefficient 
gives the number of different candidates for recovering the 
original M a p 1 hyperedges between , Vg 9 and V b . Hence, 
the description length of the additional information for the 
receiver to recover X (i.e. the conditional information be- 
tween X and Y) is 



and V b , the parentheses 



L(X|Y) = log 



nnn 

a=l ^-1 7=1 



r g b 



Pi 



Algorithm 1: Detecting communities in a tripartite hy- 

pergraph H by minimizing quality function Q 

Input: Connectivity array A of Jf 
Output: Partition of H into communities 
l begin 

// Phase 1 

assign each node in M a unique label; 
repeat 

j update each node's label; 
until a local minimum of Q 

repeat 

// Phase 2 

build a reduced tripartite hypcrgraph H* ; 
assign each node in M r a unique label; 
repeat 

| update each node's label; 
until a local minimum of Q 

// Phase 1 

retrieve labels in M from the corresponding labels in 
H'; 
repeat 

| update each node's label; 
until a local minimum of Q 
until no change in Q 

identity communities as groups of nodes bearing the same 
labels; 
18 end 



The objective is that signaler transmits the least while 
the receiver receives the most. Intuitively, if the signaler 
makes a "good" partition as described in Section [3 which 
capitalizes on regularities in the hypergraph's structure, the 
compression based on it would achieve the optimal trade- 
off between L(Y) and L(X|Y). According to the minimum 
description length (MDL) principle |10| , 

Q(<T) = L(Y) + L(X|Y) 

= n r logc r + n s logc 9 + n b \ogc + c r c a c\og{m + 1) 



+ log 
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(1) 



would get the minimum value. This is the quality function 
for measuring the goodness of a partition ^ of a tripartite 
hypergraph into communities. 

Now we can evaluate a partition based on the quality func- 
tion Q, and a low value of Q indicates a good partition. So 
the task is to search over all possible partitions for one that 
has a minimum Q. However, like modularity optimization, 
finding the global optimal solution is NP-hard [2]. We de- 
velop an approximate algorithm that can be implemented 
in near linear time, as presented in Algorithm [T] For more 
information, please refer to [TJ[S]. 

4. EVALUATION 

In this section, we concentrate on comparing our method 
with previous ones in terms of accuracy. The basic scheme is 
as follows: we apply various methods to a set of synthetic tri- 
partite hypergraphs with known community structure (the 
true partition), and compare the similarities between parti- 
tions obtained by different methods and the true partition; 
the closer of an obtained partition to the true partition, 
the better of the corresponding method. To quantify the 
similarity between two partitions, we use normalized mutual 
information (NMI) .3.!, which has a maximum value of 1 if 



two partitions match completely, and a minimum value of 
if they are totally independent of one another. 

We consider several opponent methods that cover state 
of the art techniques. They are, in order, the extended 
modularity optimization method (ExModularity) proposed 
by Murata 7], the tensor decomposition method (MetaFac) 
presented by Lin et al. [1], and the method advanced by 
Neubauer et al., which involves reduction of a tripartite hy- 
pergraph to bipartite graphs (BiNetReduction) [8]. In addi- 
tion, we consider another method (UniNetReduction) modi- 
fied from Zlatic's approach [13], which involves reduction of 
a tripartite hypergraph to unipartite graphs. 

For the first case, we consider comparing these methods 
in a set of synthetic tripartite hypergraphs with built in 
communities of one-to-one correspondence (detailed proce- 
dure for generating the dataset is omitted here). In Fig. [2] 
we show the performances of various methods in this set of 
hypergraphs. On the whole, performance of each method 
varies in a similar way across red, green and blue node sets 
(since red, green and blue nodes are in a symmetric status 
in the hypergraph generation procedures). Specifically, Ex- 
Modularity, BiNetReduction and our method perform excel- 
lently, correctly detecting not only the numbers of commu- 
nities but also community membership of each node almost 
all the way to the point pd en se~0.05. At the turning stage, 
i.e. Pdense falling from 0.045 to 0.03, our method slightly 
outperforms ExModularity and BiNetReduction, as shown 
in the embedded figures. Thereafter, performances of the 
three methods deteriorate markedly. MetaFac, though given 
a prior knowledge of the true numbers of communities, does 
not provide remarkable result. The record for UniNetRe- 
duction is even worse. Its performance decreases as early as 
Pdense~0.055. When Pdense<0.3, it loses most of the infor- 
mation about the true partition. 

For the second case, we generated a set of synthetic tri- 
partite hypergraphs with built in communities of many-to- 
many correspondence (detailed procedure for generating the 
dataset is omitted here). Applying different methods to this 
set of synthetic hypergraphs, we calculate NMI between ob- 
tained partitions and the true partition. The results are 
shown in Fig. [3] (values are averaged over 20 runs). As Fig. [3] 
shows, our method outperforms others by a large margin. It 
works almost perfectly all the way until p t j ense =0. 015, with 
a sudden dramatic fall thereafter. As for other methods, we 
can observe three common features. 1) None of them can 
detect community membership with 100% accuracy, even 
when pdense=0.08. 2) Their best performances are in red 
node set, the middle in green node set, and the worst in blue 
node set. 3) Their performances deteriorate much earlier 
than our method, often with records fluctuating wildly be- 
fore the turning points. In specific, UniNetReduction is the 
best among them in most of the time, followed by MetaFac. 
Note that MetaFac is given at least an estimate of the true 
numbers of communities, so its performance is not appeal- 
ing. Contrary to the excellent performance in the previous 
set of hypergraphs, BiNetReduction and ExModularity do 
not show satisfactory result this time. 

5. CONCLUSION 

Based on the information compression idea, we define 
a quality function for measuring the goodness of different 
partitions of a tripartite hypergraph into communities, and 
develop an algorithm for minimizing this quality function. 
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Figure 3: Performances in the synthetic dataset with built in communities of many-to-many correspondence. 



Compared with previous methods, our method is compe- 
tent for both communities with one-to-one correspondence 
and many-to-many correspondence. It should be empha- 
sized that our method is parameter- free. In the future, we 
would like to apply our method to real-world social tagging 
systems. 
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